Word vectors, reuse, and replicability: Towards a community repository of large-text resources
نویسندگان
چکیده
This paper describes an emerging shared repository of large-text resources for creating word vectors, including pre-processed corpora and pre-trained vectors for a range of frameworks and configurations. This will facilitate reuse, rapid experimentation, and replicability of results.
منابع مشابه
Comparing methods for automatic acquisition of Topic Signatures
The main goal of this work is to compare two methods for building Topic Signatures, which are vectors of weighted words acquired from large corpora. We used two different software tools, ExRetriever and Infomap, for acquiring Topic Signatures from corpus. Using these tools, we retrieve sense examples from large text collections. Both systems construct a query for each word sense using WordNet. ...
متن کاملThe Addgene repository: an international nonprofit plasmid and data resource
The Addgene Repository (http://www.addgene.org) was founded to accelerate research and discovery by improving access to useful, high-quality research materials and information. The repository archives plasmids generated by scientists, conducts quality control, annotates the associated data and makes the plasmids and their data available to the scientific community. Plasmid associated data under...
متن کاملTowards Distributed Learning Organizational Memories
This paper presents an analyze of a learning organizational memory and some disadvantages that such a centralized application contains. One issue is the reuse of a prototype giving access to learning resources, outside the university where it was created, by teachers of the same domain, even if they are very interested to do so. One attempt for resolving this problem is to integrate distributed...
متن کاملAHDS Digital Repository
The Arts and Humanities Data Service (AHDS) was established in 1996 to collect, preserve and encouraging the reuse of digital resources created during scholarly research in the arts and humanities. The AHDS is now responsible for the preservation of over 3,000 digital resources and holds a wide range of data types, from plain text and image files to datasets (spreadsheets, databases, statistica...
متن کاملHESA: The Construction and Evaluation of Hierarchical Software Feature Repository
Nowadays, the demand for software resources on different granularity is becoming prominent in software engineering field. However, a large quantity of heterogeneous software resources have not been organized in a reasonable and efficient way. Software features, a kind of important knowledge for software reuse, are ideal materials to characterize software resources. Our preliminary study shows t...
متن کامل